Skip to content

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented Oct 22, 2025

📄 6,776% (67.76x) speedup for is_passthrough_request_using_router_model in litellm/proxy/pass_through_endpoints/llm_passthrough_endpoints.py

⏱️ Runtime : 84.5 milliseconds 1.23 milliseconds (best of 151 runs)

📝 Explanation and details

The optimization introduces caching to eliminate expensive repeated calls to llm_router.get_model_names().

Key Changes:

  • Added a module-level cache _model_names_cache that stores set objects keyed by router instance ID
  • On first call for a router, fetches model names, converts to set, and caches the result
  • Subsequent calls for the same router use the cached set directly
  • Simplified the membership check to a direct return model in model_names_set

Why This Creates Massive Speedup:
The line profiler shows llm_router.get_model_names() was the bottleneck, taking 96% of execution time (373ms out of 389ms total). This suggests the method is expensive - likely involving I/O operations or complex data processing. By caching the converted set, we:

  1. Eliminate redundant expensive calls - get_model_names() now only runs once per unique router (50 times vs 2056 times in the profile)
  2. Avoid repeated set conversion - The list-to-set conversion (13.2ms in original) now happens only once per router
  3. Maintain O(1) lookup performance while removing the setup overhead

Test Case Performance:

  • Large scale tests show biggest gains (3000-9000% faster) - these benefit most from avoiding repeated expensive operations
  • Single router, multiple requests scenarios see dramatic improvements (9590% faster)
  • Basic edge cases show modest gains (10-40% faster) since they still benefit from avoiding the setup overhead
  • Cold start cases (first call to new router) may be slightly slower due to caching logic, but subsequent calls are much faster

This optimization is particularly effective for applications that repeatedly query the same router instance with different models, which appears to be the common usage pattern based on the test scenarios.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 2068 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import pytest
from litellm.proxy.pass_through_endpoints.llm_passthrough_endpoints import \
    is_passthrough_request_using_router_model

# --- Function to test ---
# Minimal stubs for Router and is_known_model, as per provided code

class Router:
    """
    Minimal Router stub for testing.
    Accepts a model_list (list of dicts with 'model_name' key).
    """
    def __init__(self, model_list=None):
        self._model_list = model_list or []

    def get_model_names(self, team_id=None):
        # Returns all model names in the router (ignores team_id for simplicity)
        return [d.get("model_name", "") for d in self._model_list]
from litellm.proxy.pass_through_endpoints.llm_passthrough_endpoints import \
    is_passthrough_request_using_router_model

# --- Unit Tests ---

# 1. Basic Test Cases

def test_basic_model_present():
    # Router with two models, request matches one
    router = Router(model_list=[{"model_name": "gpt-3.5-turbo"}, {"model_name": "gpt-4"}])
    request = {"model": "gpt-4"}
    codeflash_output = is_passthrough_request_using_router_model(request, router) # 2.40μs -> 1.80μs (33.4% faster)

def test_basic_model_absent():
    # Router with models, request model not present
    router = Router(model_list=[{"model_name": "gpt-3.5-turbo"}, {"model_name": "gpt-4"}])
    request = {"model": "gpt-neo"}
    codeflash_output = is_passthrough_request_using_router_model(request, router) # 2.10μs -> 1.56μs (34.5% faster)

def test_basic_empty_router():
    # Router with no models, any request should return False
    router = Router(model_list=[])
    request = {"model": "gpt-3.5-turbo"}
    codeflash_output = is_passthrough_request_using_router_model(request, router) # 1.76μs -> 1.69μs (4.32% faster)

def test_basic_none_router():
    # Router is None, should return False
    request = {"model": "gpt-3.5-turbo"}
    codeflash_output = is_passthrough_request_using_router_model(request, None) # 793ns -> 723ns (9.68% faster)

def test_basic_none_model():
    # Request 'model' is None, should return False
    router = Router(model_list=[{"model_name": "gpt-3.5-turbo"}])
    request = {"model": None}
    codeflash_output = is_passthrough_request_using_router_model(request, router) # 788ns -> 769ns (2.47% faster)

def test_basic_missing_model_key():
    # Request missing 'model' key, should return False
    router = Router(model_list=[{"model_name": "gpt-3.5-turbo"}])
    request = {}
    codeflash_output = is_passthrough_request_using_router_model(request, router) # 774ns -> 703ns (10.1% faster)

def test_basic_model_name_empty_string():
    # Model name is empty string, should return True only if router has empty string model
    router = Router(model_list=[{"model_name": ""}])
    request = {"model": ""}
    codeflash_output = is_passthrough_request_using_router_model(request, router) # 2.10μs -> 2.76μs (23.9% slower)

def test_basic_model_name_empty_string_absent():
    # Model name is empty string, router does not have it
    router = Router(model_list=[{"model_name": "gpt-3.5-turbo"}])
    request = {"model": ""}
    codeflash_output = is_passthrough_request_using_router_model(request, router) # 2.02μs -> 1.69μs (19.5% faster)

# 2. Edge Test Cases

def test_edge_model_name_case_sensitivity():
    # Model names are case-sensitive
    router = Router(model_list=[{"model_name": "GPT-3.5-TURBO"}])
    request = {"model": "gpt-3.5-turbo"}
    codeflash_output = is_passthrough_request_using_router_model(request, router) # 2.02μs -> 1.82μs (10.6% faster)

def test_edge_model_name_whitespace():
    # Model names with leading/trailing whitespace
    router = Router(model_list=[{"model_name": " gpt-3.5-turbo "}, {"model_name": "gpt-4"}])
    request = {"model": " gpt-3.5-turbo "}
    codeflash_output = is_passthrough_request_using_router_model(request, router) # 2.15μs -> 1.73μs (24.4% faster)
    request2 = {"model": "gpt-3.5-turbo"}
    codeflash_output = is_passthrough_request_using_router_model(request2, router) # 1.00μs -> 636ns (57.2% faster)

def test_edge_model_name_special_characters():
    # Model names with special characters
    router = Router(model_list=[{"model_name": "gpt-3.5-turbo!"}])
    request = {"model": "gpt-3.5-turbo!"}
    codeflash_output = is_passthrough_request_using_router_model(request, router) # 1.92μs -> 1.57μs (21.7% faster)
    request2 = {"model": "gpt-3.5-turbo"}
    codeflash_output = is_passthrough_request_using_router_model(request2, router) # 922ns -> 669ns (37.8% faster)

def test_edge_router_with_duplicate_model_names():
    # Router with duplicate model names
    router = Router(model_list=[{"model_name": "gpt-4"}, {"model_name": "gpt-4"}])
    request = {"model": "gpt-4"}
    codeflash_output = is_passthrough_request_using_router_model(request, router) # 1.97μs -> 1.66μs (18.3% faster)

def test_edge_model_name_numeric():
    # Model names are numeric strings
    router = Router(model_list=[{"model_name": "12345"}])
    request = {"model": "12345"}
    codeflash_output = is_passthrough_request_using_router_model(request, router) # 1.93μs -> 2.37μs (18.6% slower)

def test_edge_model_name_none_in_router():
    # Router has None as a model name (should be handled gracefully)
    router = Router(model_list=[{"model_name": None}])
    request = {"model": None}
    codeflash_output = is_passthrough_request_using_router_model(request, router) # 740ns -> 691ns (7.09% faster)

def test_edge_model_name_boolean():
    # Model names are boolean strings
    router = Router(model_list=[{"model_name": "True"}, {"model_name": "False"}])
    request = {"model": "True"}
    codeflash_output = is_passthrough_request_using_router_model(request, router) # 2.02μs -> 2.71μs (25.7% slower)
    request2 = {"model": "False"}
    codeflash_output = is_passthrough_request_using_router_model(request2, router) # 843ns -> 769ns (9.62% faster)

def test_edge_router_is_not_router_instance():
    # llm_router is not a Router instance, should handle gracefully
    class DummyRouter:
        def get_model_names(self, team_id=None):
            return ["gpt-3.5-turbo"]
    dummy_router = DummyRouter()
    request = {"model": "gpt-3.5-turbo"}
    codeflash_output = is_passthrough_request_using_router_model(request, dummy_router) # 1.61μs -> 1.70μs (5.34% slower)


def test_edge_router_model_names_returns_non_list():
    # get_model_names returns something other than a list
    class WeirdRouter:
        def get_model_names(self, team_id=None):
            return None
    weird_router = WeirdRouter()
    request = {"model": "gpt-3.5-turbo"}
    codeflash_output = is_passthrough_request_using_router_model(request, weird_router) # 2.44μs -> 1.74μs (40.3% faster)

def test_edge_router_get_model_names_raises():
    # get_model_names raises an exception
    class FailingRouter:
        def get_model_names(self, team_id=None):
            raise Exception("fail")
    failing_router = FailingRouter()
    request = {"model": "gpt-3.5-turbo"}
    codeflash_output = is_passthrough_request_using_router_model(request, failing_router) # 2.06μs -> 2.35μs (12.3% slower)

def test_edge_model_name_is_int():
    # Model name is an integer
    router = Router(model_list=[{"model_name": "123"}])
    request = {"model": 123}
    codeflash_output = is_passthrough_request_using_router_model(request, router) # 2.22μs -> 1.67μs (32.8% faster)

def test_edge_model_name_is_list():
    # Model name is a list
    router = Router(model_list=[{"model_name": "gpt-3.5-turbo"}])
    request = {"model": ["gpt-3.5-turbo"]}
    codeflash_output = is_passthrough_request_using_router_model(request, router) # 3.10μs -> 3.61μs (13.9% slower)

def test_edge_router_with_large_number_of_models_and_empty_model_name():
    # Router with many models, request with empty string
    router = Router(model_list=[{"model_name": f"model-{i}"} for i in range(100)] + [{"model_name": ""}])
    request = {"model": ""}
    codeflash_output = is_passthrough_request_using_router_model(request, router) # 13.2μs -> 11.6μs (13.9% faster)

# 3. Large Scale Test Cases

def test_large_scale_many_models_present():
    # Router with 1000 models, request matches one
    model_names = [f"model-{i}" for i in range(1000)]
    router = Router(model_list=[{"model_name": name} for name in model_names])
    request = {"model": "model-999"}
    codeflash_output = is_passthrough_request_using_router_model(request, router) # 80.1μs -> 1.85μs (4232% faster)

def test_large_scale_many_models_absent():
    # Router with 1000 models, request does not match
    model_names = [f"model-{i}" for i in range(1000)]
    router = Router(model_list=[{"model_name": name} for name in model_names])
    request = {"model": "model-not-present"}
    codeflash_output = is_passthrough_request_using_router_model(request, router) # 77.1μs -> 67.7μs (13.9% faster)

def test_large_scale_duplicate_models():
    # Router with 500 duplicate models, request matches
    router = Router(model_list=[{"model_name": "dup-model"} for _ in range(500)])
    request = {"model": "dup-model"}
    codeflash_output = is_passthrough_request_using_router_model(request, router) # 21.3μs -> 22.2μs (4.19% slower)

def test_large_scale_many_requests():
    # Many requests to the same router, all should be True
    model_names = [f"model-{i}" for i in range(1000)]
    router = Router(model_list=[{"model_name": name} for name in model_names])
    for i in range(1000):
        request = {"model": f"model-{i}"}
        codeflash_output = is_passthrough_request_using_router_model(request, router) # 41.8ms -> 431μs (9590% faster)

def test_large_scale_many_requests_absent():
    # Many requests to the same router, none should be True
    model_names = [f"model-{i}" for i in range(1000)]
    router = Router(model_list=[{"model_name": name} for name in model_names])
    for i in range(1000):
        request = {"model": f"absent-model-{i}"}
        codeflash_output = is_passthrough_request_using_router_model(request, router) # 41.8ms -> 437μs (9456% faster)

def test_large_scale_router_with_long_model_names():
    # Router with very long model names
    long_name = "gpt-" + "x" * 500
    router = Router(model_list=[{"model_name": long_name}])
    request = {"model": long_name}
    codeflash_output = is_passthrough_request_using_router_model(request, router) # 2.56μs -> 3.15μs (18.5% slower)

def test_large_scale_router_with_varied_types():
    # Router with model names of varied types (should only match exact string)
    router = Router(model_list=[
        {"model_name": "gpt-3.5-turbo"},
        {"model_name": 123},
        {"model_name": None},
        {"model_name": ""},
        {"model_name": "True"},
    ])
    codeflash_output = is_passthrough_request_using_router_model({"model": "gpt-3.5-turbo"}, router) # 3.03μs -> 3.12μs (2.85% slower)
    codeflash_output = is_passthrough_request_using_router_model({"model": 123}, router) # 1.36μs -> 674ns (102% faster)
    codeflash_output = is_passthrough_request_using_router_model({"model": None}, router) # 353ns -> 346ns (2.02% faster)
    codeflash_output = is_passthrough_request_using_router_model({"model": ""}, router) # 1.08μs -> 455ns (136% faster)
    codeflash_output = is_passthrough_request_using_router_model({"model": "True"}, router) # 898ns -> 374ns (140% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import pytest
from litellm.proxy.pass_through_endpoints.llm_passthrough_endpoints import \
    is_passthrough_request_using_router_model


# Minimal Router class for testing purposes
class Router:
    def __init__(self, model_list=None):
        self._model_list = model_list or []
        # model_list is a list of dicts with "model_name" key
    def get_model_names(self, team_id=None):
        # Returns all model names in the router
        return [m.get("model_name", "") for m in self._model_list]
from litellm.proxy.pass_through_endpoints.llm_passthrough_endpoints import \
    is_passthrough_request_using_router_model

# ----------- UNIT TESTS ------------

# 1. Basic Test Cases

def test_basic_model_present():
    # Router with two models
    router = Router(model_list=[
        {"model_name": "gpt-3.5-turbo"},
        {"model_name": "gpt-4"}
    ])
    # Request with a model that exists
    req = {"model": "gpt-3.5-turbo"}
    codeflash_output = is_passthrough_request_using_router_model(req, router) # 2.28μs -> 2.83μs (19.4% slower)

def test_basic_model_absent():
    # Router with one model
    router = Router(model_list=[
        {"model_name": "gpt-3.5-turbo"}
    ])
    # Request with a model that does NOT exist
    req = {"model": "gpt-4"}
    codeflash_output = is_passthrough_request_using_router_model(req, router) # 1.95μs -> 1.45μs (34.7% faster)

def test_basic_empty_model_list():
    # Router with no models
    router = Router(model_list=[])
    # Request with any model
    req = {"model": "gpt-3.5-turbo"}
    codeflash_output = is_passthrough_request_using_router_model(req, router) # 1.83μs -> 2.44μs (25.0% slower)

def test_basic_none_router():
    # Router is None
    req = {"model": "gpt-3.5-turbo"}
    codeflash_output = is_passthrough_request_using_router_model(req, None) # 773ns -> 732ns (5.60% faster)

def test_basic_none_model():
    # Model is None
    router = Router(model_list=[
        {"model_name": "gpt-3.5-turbo"}
    ])
    req = {"model": None}
    codeflash_output = is_passthrough_request_using_router_model(req, router) # 789ns -> 731ns (7.93% faster)

def test_basic_no_model_key():
    # Request body missing "model" key
    router = Router(model_list=[
        {"model_name": "gpt-3.5-turbo"}
    ])
    req = {}
    codeflash_output = is_passthrough_request_using_router_model(req, router) # 773ns -> 693ns (11.5% faster)

def test_basic_model_name_empty_string():
    # Model is empty string
    router = Router(model_list=[
        {"model_name": "gpt-3.5-turbo"}
    ])
    req = {"model": ""}
    codeflash_output = is_passthrough_request_using_router_model(req, router) # 2.20μs -> 2.81μs (21.7% slower)

def test_basic_router_with_empty_model_name():
    # Router has an empty model name
    router = Router(model_list=[
        {"model_name": ""}
    ])
    req = {"model": ""}
    codeflash_output = is_passthrough_request_using_router_model(req, router) # 2.11μs -> 2.58μs (18.2% slower)

# 2. Edge Test Cases

def test_edge_model_name_case_sensitivity():
    # Router model names are case-sensitive
    router = Router(model_list=[
        {"model_name": "GPT-3.5-TURBO"}
    ])
    req = {"model": "gpt-3.5-turbo"}
    codeflash_output = is_passthrough_request_using_router_model(req, router) # 2.06μs -> 2.49μs (17.3% slower)

def test_edge_model_name_whitespace():
    # Model name with leading/trailing whitespace
    router = Router(model_list=[
        {"model_name": "gpt-3.5-turbo"}
    ])
    req = {"model": " gpt-3.5-turbo "}
    codeflash_output = is_passthrough_request_using_router_model(req, router) # 2.08μs -> 2.50μs (16.7% slower)

def test_edge_model_name_special_characters():
    # Model name with special characters
    router = Router(model_list=[
        {"model_name": "gpt-3.5-turbo$"}
    ])
    req = {"model": "gpt-3.5-turbo$"}
    codeflash_output = is_passthrough_request_using_router_model(req, router) # 1.99μs -> 2.52μs (21.2% slower)

def test_edge_model_name_numeric():
    # Model name is numeric
    router = Router(model_list=[
        {"model_name": "12345"}
    ])
    req = {"model": "12345"}
    codeflash_output = is_passthrough_request_using_router_model(req, router) # 1.96μs -> 2.50μs (21.6% slower)

def test_edge_router_model_list_with_duplicates():
    # Router with duplicate model names
    router = Router(model_list=[
        {"model_name": "gpt-3.5-turbo"},
        {"model_name": "gpt-3.5-turbo"}
    ])
    req = {"model": "gpt-3.5-turbo"}
    codeflash_output = is_passthrough_request_using_router_model(req, router) # 2.21μs -> 2.78μs (20.4% slower)

def test_edge_router_model_name_none():
    # Router has a model with None as name
    router = Router(model_list=[
        {"model_name": None}
    ])
    req = {"model": None}
    # Should be False, None is not a valid model name to match
    codeflash_output = is_passthrough_request_using_router_model(req, router) # 777ns -> 661ns (17.5% faster)


def test_edge_router_get_model_names_returns_non_list():
    # Monkeypatch Router to return non-list from get_model_names
    class BrokenRouter(Router):
        def get_model_names(self, team_id=None):
            return None  # Should be list
    router = BrokenRouter(model_list=[
        {"model_name": "gpt-3.5-turbo"}
    ])
    req = {"model": "gpt-3.5-turbo"}
    # Should not crash, just not find the model
    codeflash_output = is_passthrough_request_using_router_model(req, router) # 2.82μs -> 3.27μs (13.7% slower)

def test_edge_router_with_non_str_model_names():
    # Router has model names that are not strings
    router = Router(model_list=[
        {"model_name": 123},
        {"model_name": "gpt-4"}
    ])
    req = {"model": 123}
    codeflash_output = is_passthrough_request_using_router_model(req, router) # 2.71μs -> 3.13μs (13.4% slower)
    req2 = {"model": "gpt-4"}
    codeflash_output = is_passthrough_request_using_router_model(req2, router) # 998ns -> 718ns (39.0% faster)

def test_edge_model_name_is_object():
    # Model name is a non-string object
    router = Router(model_list=[
        {"model_name": ("gpt", "3.5", "turbo")}
    ])
    req = {"model": ("gpt", "3.5", "turbo")}
    codeflash_output = is_passthrough_request_using_router_model(req, router) # 2.24μs -> 2.75μs (18.7% slower)


def test_edge_request_body_with_extra_keys():
    # Request body has extra keys
    router = Router(model_list=[
        {"model_name": "gpt-3.5-turbo"}
    ])
    req = {"model": "gpt-3.5-turbo", "other": 123, "foo": "bar"}
    codeflash_output = is_passthrough_request_using_router_model(req, router) # 2.85μs -> 3.38μs (15.6% slower)

def test_edge_router_model_list_with_missing_model_name_key():
    # Router model_list missing "model_name" key
    router = Router(model_list=[
        {"not_model_name": "gpt-3.5-turbo"}
    ])
    req = {"model": "gpt-3.5-turbo"}
    # Should not find the model
    codeflash_output = is_passthrough_request_using_router_model(req, router) # 2.22μs -> 2.74μs (18.8% slower)

def test_edge_router_model_list_with_empty_dict():
    # Router model_list contains empty dict
    router = Router(model_list=[
        {}
    ])
    req = {"model": ""}
    codeflash_output = is_passthrough_request_using_router_model(req, router) # 2.15μs -> 2.65μs (18.9% slower)

# 3. Large Scale Test Cases

def test_large_scale_many_models_present():
    # Router with 1000 models, request matches one
    model_names = [f"model-{i}" for i in range(1000)]
    router = Router(model_list=[{"model_name": name} for name in model_names])
    req = {"model": "model-999"}
    codeflash_output = is_passthrough_request_using_router_model(req, router) # 79.6μs -> 1.97μs (3933% faster)

def test_large_scale_many_models_absent():
    # Router with 1000 models, request does not match any
    model_names = [f"model-{i}" for i in range(1000)]
    router = Router(model_list=[{"model_name": name} for name in model_names])
    req = {"model": "model-1001"}
    codeflash_output = is_passthrough_request_using_router_model(req, router) # 77.5μs -> 2.05μs (3688% faster)

def test_large_scale_all_empty_model_names():
    # Router with 1000 empty model names, request is empty string
    router = Router(model_list=[{"model_name": ""} for _ in range(1000)])
    req = {"model": ""}
    codeflash_output = is_passthrough_request_using_router_model(req, router) # 39.0μs -> 2.55μs (1432% faster)

def test_large_scale_all_none_model_names():
    # Router with 1000 None model names, request is None
    router = Router(model_list=[{"model_name": None} for _ in range(1000)])
    req = {"model": None}
    codeflash_output = is_passthrough_request_using_router_model(req, router) # 860ns -> 806ns (6.70% faster)

def test_large_scale_model_names_are_integers():
    # Router with 1000 integer model names, request matches one
    router = Router(model_list=[{"model_name": i} for i in range(1000)])
    req = {"model": 500}
    codeflash_output = is_passthrough_request_using_router_model(req, router) # 44.1μs -> 2.30μs (1816% faster)

def test_large_scale_model_names_are_tuples():
    # Router with 1000 tuple model names, request matches one
    router = Router(model_list=[{"model_name": (i, i+1)} for i in range(1000)])
    req = {"model": (123, 124)}
    codeflash_output = is_passthrough_request_using_router_model(req, router) # 65.6μs -> 2.28μs (2774% faster)

def test_large_scale_model_names_are_mixed_types():
    # Router with 1000 model names, mixed types
    model_names = []
    for i in range(500):
        model_names.append(str(i))
    for i in range(500, 1000):
        model_names.append(i)
    router = Router(model_list=[{"model_name": name} for name in model_names])
    req_str = {"model": "499"}
    req_int = {"model": 750}
    codeflash_output = is_passthrough_request_using_router_model(req_str, router) # 69.2μs -> 62.7μs (10.5% faster)
    codeflash_output = is_passthrough_request_using_router_model(req_int, router) # 57.9μs -> 1.09μs (5222% faster)

def test_large_scale_request_body_with_large_extra_keys():
    # Request body has many extra keys, but correct model
    router = Router(model_list=[
        {"model_name": "gpt-3.5-turbo"}
    ])
    req = {"model": "gpt-3.5-turbo"}
    for i in range(999):
        req[f"extra_{i}"] = i
    codeflash_output = is_passthrough_request_using_router_model(req, router) # 2.21μs -> 2.89μs (23.6% slower)

def test_large_scale_router_with_duplicate_models():
    # Router with 1000 duplicate model names
    router = Router(model_list=[{"model_name": "gpt-3.5-turbo"} for _ in range(1000)])
    req = {"model": "gpt-3.5-turbo"}
    codeflash_output = is_passthrough_request_using_router_model(req, router) # 39.6μs -> 40.6μs (2.35% slower)

def test_large_scale_router_with_alternating_none_and_str():
    # Router with alternating None and "gpt-3.5-turbo"
    model_list = []
    for i in range(1000):
        if i % 2 == 0:
            model_list.append({"model_name": None})
        else:
            model_list.append({"model_name": "gpt-3.5-turbo"})
    router = Router(model_list=model_list)
    req1 = {"model": "gpt-3.5-turbo"}
    req2 = {"model": None}
    codeflash_output = is_passthrough_request_using_router_model(req1, router) # 42.1μs -> 41.7μs (0.935% faster)
    codeflash_output = is_passthrough_request_using_router_model(req2, router) # 525ns -> 519ns (1.16% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-is_passthrough_request_using_router_model-mh1bq9p4 and push.

Codeflash

The optimization introduces **caching** to eliminate expensive repeated calls to `llm_router.get_model_names()`. 

**Key Changes:**
- Added a module-level cache `_model_names_cache` that stores `set` objects keyed by router instance ID
- On first call for a router, fetches model names, converts to set, and caches the result
- Subsequent calls for the same router use the cached set directly
- Simplified the membership check to a direct `return model in model_names_set`

**Why This Creates Massive Speedup:**
The line profiler shows `llm_router.get_model_names()` was the bottleneck, taking 96% of execution time (373ms out of 389ms total). This suggests the method is expensive - likely involving I/O operations or complex data processing. By caching the converted set, we:
1. **Eliminate redundant expensive calls** - `get_model_names()` now only runs once per unique router (50 times vs 2056 times in the profile)  
2. **Avoid repeated set conversion** - The list-to-set conversion (13.2ms in original) now happens only once per router
3. **Maintain O(1) lookup performance** while removing the setup overhead

**Test Case Performance:**
- **Large scale tests show biggest gains** (3000-9000% faster) - these benefit most from avoiding repeated expensive operations
- **Single router, multiple requests** scenarios see dramatic improvements (9590% faster) 
- **Basic edge cases** show modest gains (10-40% faster) since they still benefit from avoiding the setup overhead
- **Cold start cases** (first call to new router) may be slightly slower due to caching logic, but subsequent calls are much faster

This optimization is particularly effective for applications that repeatedly query the same router instance with different models, which appears to be the common usage pattern based on the test scenarios.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 22, 2025 01:38
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants